Week 6: Advanced R Graphics and ggplot2

Advanced R Graphics

NCAA Basketball data

We will use data from the NCAA basketball tournament from 2011 - 2016.

## # A tibble: 402 x 34
##    Season Daynum Wteam Wscore Lteam Lscore Wloc  Numot  Wfgm  Wfga Wfgm3 Wfga3
##     <dbl>  <dbl> <dbl>  <dbl> <dbl>  <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1   2011    134  1155     70  1412     52 N         0    26    50     4    13
##  2   2011    134  1421     81  1114     77 N         1    27    54     4    12
##  3   2011    135  1427     70  1106     61 N         0    23    54     4    16
##  4   2011    135  1433     59  1425     46 N         0    20    59     9    24
##  5   2011    136  1139     60  1330     58 N         0    22    54     7    26
##  6   2011    136  1140     74  1459     66 N         0    24    61     6    22
##  7   2011    136  1153     78  1281     63 N         0    29    54     4    11
##  8   2011    136  1163     81  1137     52 N         0    32    66     9    24
##  9   2011    136  1196     79  1364     51 N         0    29    53     8    23
## 10   2011    136  1211     86  1385     71 N         0    28    52     9    15
## # … with 392 more rows, and 22 more variables: Wftm <dbl>, Wfta <dbl>,
## #   Wor <dbl>, Wdr <dbl>, Wast <dbl>, Wto <dbl>, Wstl <dbl>, Wblk <dbl>,
## #   Wpf <dbl>, Lfgm <dbl>, Lfga <dbl>, Lfgm3 <dbl>, Lfga3 <dbl>, Lftm <dbl>,
## #   Lfta <dbl>, Lor <dbl>, Ldr <dbl>, Last <dbl>, Lto <dbl>, Lstl <dbl>,
## #   Lblk <dbl>, Lpf <dbl>

Compute annual averages

## # A tibble: 6 x 5
##   Season Win.Points Lose.Points Win.3Pt Lose.3pt
##    <dbl>      <dbl>       <dbl>   <dbl>    <dbl>
## 1   2011       73.2        61.9    7.18     5.96
## 2   2012       71.4        61.5    5.97     5.93
## 3   2013       72.2        59.3    6.70     5.31
## 4   2014       73.9        62.9    6.19     5.46
## 5   2015       72.9        62.6    6.34     6.22
## 6   2016       78.3        65.4    7.18     6.52

Plot types: Points

Plot types: Points Code

Plot types: Bars

Plot types: Bars Code

Plot Types: Lines

Plot Types: Lines Code

Lines and Legends

Lines and Legends: Code

Points

Points: Code

Annotation

Annotation: Code

Axes

Axes: Code

Axes

Axes

Axes Box

Axes Box: Code

Exercise: Advanced Plotting

Use the Seattle Housing Data Set http://math.montana.edu/ahoegh/teaching/stat408/datasets/SeattleHousing.csv to create an interesting graphic, include informative titles, labels, and add an annotation.

## Parsed with column specification:
## cols(
##   price = col_double(),
##   bedrooms = col_double(),
##   bathrooms = col_double(),
##   sqft_living = col_double(),
##   sqft_lot = col_double(),
##   floors = col_double(),
##   waterfront = col_double(),
##   sqft_above = col_double(),
##   sqft_basement = col_double(),
##   zipcode = col_double(),
##   lat = col_double(),
##   long = col_double(),
##   yr_sold = col_double(),
##   mn_sold = col_double()
## )

Solution: Advanced Plotting

Solution: Advanced Plotting Code

Superimposed Plots

Superimposed Plots: Code

Expression / Text

ggplot2

ggplot2 Overview

Why ggplot2?

Advantages of ggplot2

  • consistent underlying grammar of graphics (Wilkinson, 2005)
  • plot specification at a high level of abstraction
  • very flexible
  • theme system for polishing plot appearance

Grammar of Graphics

The basic idea: independently specify plot building blocks and combine them to create just about any kind of graphical display you want.

Building blocks of a graph include:

  • data
  • aesthetic mapping
  • geometric object
  • statistical transformations
  • faceting

ggplot2 VS Base Graphics

Compared to base graphics, ggplot2

  • is more verbose for simple / canned graphics
  • is less verbose for complex / custom graphics
  • does not have methods (data should always be in a data.frame)
  • uses a different system for adding plot elements

Aesthetic Mapping

Aesthetics are things that you can see. Examples include:

  • position (i.e., on the x and y axes)
  • color (“outside” color)
  • fill (“inside” color)
  • shape (of points)
  • linetype
  • size

Aesthetic mappings are set with the aes() function.

Geometric Objects (geom)

Geometric objects are the actual marks we put on a plot. Examples include:

  • points (geom_point)
  • lines (geom_line)
  • boxplot (geom_boxplot)

A plot must have at least one geom; there is no upper limit. You can add a geom to a plot using the + operator

Graphical Primitives/ ggplot

Adding Geoms: geom_point()

Adding Geoms: geom_smooth()

Adding Geoms: geom_rug()

Adding Geoms: geom_density2d()

Adding Geoms: geom_jitter()

Adding Geoms: labs()

Scales: xlim() and ylim()

Themes

There are a wide range of themes available in ggplot: theme overview

More about aes

More about aes

More about aes

More about aes

More about aes: Comment

Faceting

Faceting: Comment

Faceting

Exercise: ggplot2

Now use ggplot2 to create an interesting graph using the Seattle Housing data set.

Solution: ggplot2

Solution: ggplot2